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Abstract 

We present a method that uses distances between nearest neighbors in Takens space to evaluate 
a level of noise. The method is valid even for high noise levels. The method has been verified by 
estimation of noise levels in several chaotic systems. We have analyzed the noise level for Dow 
Jones and DAX indexes and we have found that the noise level ranges from 25 to 80 percent of the 
signal variance. 
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I. INTRODUCTION 



It is a common case that observed data are contaminated by a noise (for a review of 
methods of nonhnear time series analysis see The presence of noise can substan- 

tially affect invariant system parameters as a dimension, entropy or Lapunov exponents. 
In fact Schreiber has shown that even 2% of noise can make a dimension calculation 
misleading. It follows that the assessment of the noise level can be crucial for estimation of 
system invariant parameters. Even after performing a noise reduction one is interested to 
evaluate the noise level in the cleaned data. In the experiment the noise is often regarded 
as a measurement uncertainty which corresponds to a random variable added to the system 
temporary state or to the experiment outcome. This kind of noise is usually called the 
measurement or the additive noise. Another case is the noise influencing the system dy- 
namics, what corresponds to the Langevine equation and can be called the dynamical noise. 
The second case is more difficult to analyze because the noise acting at moment to usually 
changes the trajectory for t > to- It follows that there is no clean trajectory and instead 
of it an e-shadowed trajectory occurs 4]. For real data a signal (e.g. physical experiment 
data or economic data) is subjected to the mixture of both kinds of noise (measurement and 
dynamical) . ^ 

Schreiber has developed a method of noise level estimation [3| by evaluating the influence 
of noise on the correlation dimension of investigated system. The Schreiber method is valid 
for rather small gaussian measurement noise and needs values of the embedding dimension d, 
the embedding delay r and the characteristic dimension r spanned by the system dynamics. 

Diks 1^ investigated properties of correlation integral with the gaussian kernel in the 
presence of noise. The Diks method makes use of a fitting function for correlation integrals 
calculated from time series for different thresholds 6. The function depends on system 
variables K2 (correlation entropy), D2 (correlation dimension), a (standard noise deviation) 
and a normalizing constant $. These four variable are estimated using the least squares 
fitting. The Diks method is valid for a noise level up to 25% of signal variance and for 
various measurement noise distributions. The Diks's method needs optimal values of the 
embedding dimension d, the embedding delay r and the maximal threshold Ec- 

Hsu et al. ^ developed a method of noise reduction and they used this method for noise 
level estimation. The method explored the local-geometric-projection principle and is useful 
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for various noise distributions but rather small noise levels. To use the method one needs to 
choose a number of neighboring points to be regarded, an appropriate number of iterations 
as well as optimal parameters values d and r. 

Oltmans et al. considered influence of noise on the probability density function /„(£:) 
but they could take into account only a small measurement noise. They used a fit of /„(£) to 
the corresponding function which was found for small e. Their fitting function is similar to 
the probability density distribution that we receive from correlation integrals ■^DETn{e). 
The method needs as input parameters values of d, t and Sc- 

In Ref. 0] we presented a method of noise level estimation by coarse-grained correlation 
entropy (NECE). The crucial point of this method is fitting of a proper function to the 
estimated correlation entropy. This method does not demand any input parameters like the 
embedding dimension d or the embedding delay r. The minimal and maximal values of the 
threshold parameter e can be automatically estimated. The NECE method will be used 
further as the reference method. 

In this paper we present another method for evaluation of a noise level. The method 
makes use of neighboring distances in the embedding space (NEND) and will be introduced 
in the next section. In the further section we show an application of this method to stock 
market data. Although it is a common believe that the stock market behaviour is driven 
by .ocha.,c p..oces.es H [1 Q iUs dimcu. .o sepa... .ocKast.c and de.e..i„Mic 
components of market dynamics. In fact the deterministic fraction follows usually from 
nonlinear effects and can possess a non-periodic or even chaotic characteristic With 
the help of the NEND and NECE methods 3] we try to demonstrate that stock market 
data are not purely stochastic and a deterministic part can be sometimes dominant. 



II. METHOD OF NOISE ESTIMATION BY USE OF NEIGHBORING DIS- 
TANCES IN TAKENS SPACE (NEND) 

Let {xi} where i = 1,2,...,N be a time series and Xj = {xi,Xi+r, ■■■,Xi+(d-i)T} a corre- 
sponding (i-dimensional vector constructed in the embedded space where d is an embedding 
dimension and r is an embedding delay. The method is based on the assumption that the 
minimal distance between nearest neighbors is described by the standard deviation of noise. 
The nearest neighbor is found using the Euclidian norm i.e. the distance is measured using 
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the following formula 



(Xi - Xj) + {Xi_r - Xj^-r) + • • • + {Xi-{d-l)T " ^^^-(d-l)^) • (l) 

The nearest neighbor of the vector x„ is the vector such that 

{xj : VfcXfc, {k ^ j, n),diSn,j < diSn,k} ■ (2) 

We will assume that the distance between the vector x„ and its nearest neighbor {dis^^) 
is calculated in a large embedding dimension d >> 1. 

For linear systems without a noise the minimal distance between nearest neighbors should 
decrease with an increasing number of data in time series and for ^ oo this distance will 
tend to zero since the trajectory reaches the final periodic orbit. For deterministic chaotic 
systems such minimal distances depend on the system entropy but they also tend to zero 
for an infinite number of data when the trajectory densely fills the chaotic attractor. 

In the case when we add to an observed deterministic trajectory a Gaussian non-correlated 
noise the corresponding minimal distance dis^^ is increased. For a large value of the 
embedding dimension d >> 1 the distances can be estimated as a standard deviation of a 
superposition of 2d independent stochastic variables, i.e. 

rfis^^ ^ V2da, (3) 

where a is the standard deviation of a noise added to the signal. 

The approximation Q is valid only in limits of very long time series and a large embedding 
dimension d. If we generate surrogate data {survi} by the random shuffling of the original 
data 14j then this kind of surrogates preserves mean, variance and histogram but removes 
any determinism in data. The minimal distance between nearest neighbours calculated in 
an embedded space for surrogate data should be proportional to standard deviation of data. 

Now let us define the Noise- To-Signal ratio as the proportion of a to the standard devi- 
ation of data a data 

NTS = ^— . (4) 

(^data 

In the first step of the method we calculate all distances between nearest neighbors in the 
original and in the surrogate data. Then we search for the smallest distance for each data set: 
dis^^ = miun [dis^^^ and dis'^^l^'^^ = min„ jdzs*"''^'^^}. Using the approximation Q, 
i.e. the linear dependence of the distance dis^f^ on the noise level, we can introduce the 
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output parameter of the method ADETd jl^, which is related to the Noise- To-Signal ratio 
(NTS) as follows: 

NTS ^ ADETd = -/ (5) 

/ r ■ surr,NN \ 

Here we denoted (^dis^^[^'^^^ as an average of m realizations of the surrogate data (m 
appears as the parameter of the method). 



III. NOISE ESTIMATION: EXAMPLES AND APPLICATION TO STOCK MAR- 
KET DATA 

The NEND method described in the previous section is very simple to use. A drawback 
of this method is a large error of the estimated noise level for short time series and too 
low embedding dimensions d. The estimation error increases if we take a smaller number of 
random surrogates data that are used for the averaging formula (0). In the Table H] examples 
of noise estimation by the NEND method are presented in comparison to results of NECE 
method (see The estimation error of the NEND method is based on a standard deviation 
of temporary values of ADETd for different realizations of surrogate data. One can see that 
the NEND method, despite its simplicity, works quite well for considered cases. Although 
the NECE method gives better accuracy as compared to the NEND method but the first 
method is much more sophisticated and difficult for computer implementation. CPU times 
needed by computers are comparable for both methods. 

The both methods were applied to evaluate the noise levels in stock market data. Here we 
present results for Dow Jones Industrial Average (DJIA) during the time period 1896-2002 
(daily returns, see Fig. and DAX (German Stock Market Index) during the time period 
1998-2000 (4 minutes returns, see Fig. EJ. Returns defined as 

= In ' (6) 

where Pn is the value of an index at the time n. Noise levels for both indexes are in 
the range NTS ~ 0.5 — 0.9 as one can see in Figs ()2I3I5I6|) . It follows that considered 
stock market data are not purely stochastic because the percent of determinism ranges 
(1 - NTS'^) ■ 100% ^ 20 - 75% and the stochastic part is about 25 - 80%. In Figs ^ we 
present noise levels ADETg calculated with the NEND method for DJIA and DAX indexes 
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TABLE I: Results of the noise level estimation for different systems. In the case of NEND method 
we used d = 9, m = 20 and iV = 3000. 
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respectively. For the comparison in Figs ()3I6|) we show noise levels NTS calculated with the 
NECE method. The NEND method and the reference method NECE give similar results. 
Since both methods are different approaches thus similar results suggest good accuracy of 
both methods. Some differences in noise levels estimations between both methods appear 
in the periods of increased volatility (1930-1940 for DJIA and from August to November of 
1998 for DAX), what suggests that extreme events have different impact in both methods. 
We think that the NECE method gives more relevant results in high volatility regions and 
the NEND method underestimates the noise level in such cases. 



IV. CONCLUSIONS 



In conclusion we have developed a new method of noise level estimation from time se- 
ries. The method makes use of the minimal distance between nearest neighbors in Takens 
space. The method has been tested for several systems and it has brought similar results 

n 

to the method described in Ref. |3] but it is much easier for computer implementation. The 
application of the method to stock market data gives the percent of noise ranging from 25 
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FIG. 1: Plot of daily returns of Dow Jones Index (1896-2002). 
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FIG. 2: Plot of noise levels ADETd for d = 9 calculated with the NEND method and the value of 
Dow Jones Index (1896-2002). 
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FIG. 3: Plot of noise levels NTS calculated with the NECE method and the value of Dow Jones 
Index (1896-2002). 
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FIG. 4: Plot of 4 minutes returns of DAX Index (1998-2000). 
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FIG. 5: Plot of noise levels ADETd for d = 9 calculated with the NEND method and the value of 
DAX Index (1998-2000). 
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FIG. 6: Plot of noise levels NTS calculated with the NECE method and the value of DAX Index 
(1998-2000). 
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to 80 % of signal variance. 
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